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Abstract 

In causal mediation analysis, nonparametric identification of the pure (natural) direct effect 
typically relies on, in addition to no unobserved pre-exposure confounding, fundamental as¬ 
sumptions of (i) so-called “cross-world-counterfactuals" independence and (ii) no exposure- 
induced confounding. When the mediator is binary, bounds for partial identification have 
been given when neither assumption is made, or alternatively when assuming only (ii). We 
extend existing bounds to the case of a polytomous mediator, and provide bounds for the 
case assuming only (i). We apply these bounds to data from the Harvard PEPFAR program 
in Nigeria, where we evaluate the extent to which the effects of antiretroviral therapy on vi- 
rological failure are mediated by a patient’s adherence, and show that inference on this effect 
is somewhat sensitive to model assumptions. 
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1. INTRODUCTION 


Causal mediation analysis seeks to determine the role that an intermediate variable plays in trans¬ 
mitting the effect from an exposure to an outcome. An indirect effect refers to the effect that goes 
through the intermediate variable in mediation analysis; a direct effect is a measure of the effect 
that does not. The study of causal mediation has in recent years enjoyed an explosion in popu¬ 
larity (Robins and Greenland 1992 Robins 1999 2003 Peari] 200 1[ Avin et al.[|2005[ Taylor 


et al.[ [2005| [Petersen et al.[ [2006^ [TEi Have et aL| |2007[ |Albert| |2008[ |Goetgeluk et aL| |2008 


van der Laan and PetersenI |2008t |VanderWeele| |2009i IVanderWeele and Vansteeland^ |2009 


2010 Imai et al. 2010a|bt Albert and Nelson] |2011t Tchetgen Tchetgen] , 2011; VanderWeele 


2011[ |Albert[ |2012[ |Tchetgen Tchetgen and Shpitser[ |2012[ |Wang and Albert[ \2012\ |Shpitser 


20 13[ [Tchetgen Tchetgen[[2013[ [Tchetgen Tchetgen and Shpitser[[2014^f^ng et al.[[2013HAh 


bert and Wang[ 2015[ [Hsu et al.[ 2015), not only in terms of theoretical developments, but also 


in practice, most notably in the fields of epidemiology and social sciences. This strand of work 
is based on ideas originating from Robins and Greenland (1992) and Pearl ( 2001[ ) grounded in 
the language of potential outcomes ( [Splawa-Neyman et al.[ [1990[ |Rubin[ |1974[ |1978[ ) to give 
nonparametric definitions of effects involved in mediation analysis, allowing for settings where 
interactions and nonlinearities may be present. 

Consider an intervention which sets the exposure of interest for all persons in the population 
to one of two possible values, a reference value or an active value. The total effect of such 
an intervention corresponds to the change of the counterfactual outcome mean if the exposure 


were set to the active value compared with if it were set to the reference value. Robins and 


Greenland (1992) formalized the concept of effect decomposition of the total effect into direct and 


indirect effects by defining pure direct and indirect effects. Pearl[ ( [200T[ ) relabeled these effects 
as natural direct and indirect effects. The pure direct effect (PDE) corresponds to the change in 
the counterfactual outcome mean under an intervention which changes a person’s exposure status 
from the reference value to the active value, while maintaining the person’s mediator to the value 
it would have had under the exposure reference value. In contrast, the natural indirect effect (NIE) 
corresponds to the change in the average counterfactual outcome under an intervention that sets 
a person’s exposure value to the active value, while changing the value of the mediator from the 
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value it would have had under the referenee exposure value, to its value under the aetive exposure 
value. The pde and NIE sum to give the total effeet. 

Identifieation of these natural effeets has been somewhat eontroversial as it requires as¬ 
sumptions that may be overly restrietive for many applieations in the health seienees. First, 
identifieation invokes a so-ealled eross-world-eounterfaetuals-independenee assumption, whieh 
by virtue of involving eounterfaetuals under eonflieting interventions on the exposure, ean neither 


be enforeed experimentally nor tested empirieally (Pearl, 2001; Robins and Riehardson 20101. 
Seeondly, a neeessary assumption for identifieation rules out the presenee of exposure-indueed 
eonfounding of the mediator’s effeet on the outeome, even if all eonfounders are observed. While 
this assumption is in prineiple testable provided no unmeasured eonfounding, more often than 
not, post-exposure eovariates are altogether ignored in routine applieation, in whieh ease medi¬ 
ation analyses may be invalid. These issues have reeently been eonsidered, and some work has 
been done on partial or point identifieation under a weaker assumption. Speeifieally, on the one 


hand Robins and Riehardson (20101 and Tehetgen Tehetgen and VanderWeele (20141 provide 
eonditions for point identifieation of the pure direet effeet when a eonfounder is direetly affeeted 


by the exposure. On the other hand, Robins and Riehardson (2010) give bounds for the pure 
direet effeet for binary mediator without making the eross-world-eounterfaetual-independenee 
assumption, but assuming no exposure-indueed eonfounding of the mediator-outeome relation, 
and Tehetgen Tehetgen and Phiri ( |2014 ) extend these bounds to aeeount for exposure-indueed 
eonfounding. Bounds are eommonly employed in eausal inferenee when struetural assumptions 


are not suffieiently strong to give point identifieation of a eausal parameter of interest (Robins 


1989 Balke and Pearl 1997; Zhang and Rubin[ 2003 Kaufman et al. 2005; Cheng and Small 


2006 Cai et al. 2008; Sjolander] 2009[ Taguri and Chiba 20151. We build on this previous work 


to provide a number of new nonparametrie bounds for the pure direet effeet allowing for a poly- 
tomous mediator when either (i) exposure-indueed eonfounding is present, or (ii) one does not 
assume that eross-world eounterfaetuals of the mediating and outeome variables are independent, 
or (iii) both (i) and (ii) hold. 

We apply these bounds to data from the Harvard PEPFAR program in Nigeria, where we 
evaluate the extent to whieh the effeets of antiretroviral therapy on virologieal failure are me¬ 
diated by a patient’s adherenee. We show that PEPFAR results are sensitive to the ehoiee of 
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Figure 1: (a) The three-node mediation directed acyclic graph in a setting with no confound¬ 
ing. The nodes represent random variables, and the arrows represent possible causal effects of 
one random variable on another, (b) The single-world intervention graph in the setting of (a) 
under the intervention setting A to d and M to fh. The black nodes represent random variables 
under this intervention, the red nodes represent the level an intervened random variable takes 
under this intervention, and the arrows represent possible causal effects of one variable under this 
intervention on another. 


assumptions made, consequently, we counsel investigators employing these effects to exercise 
caution in considering the basis for point identification and to explicitly state the assumptions 
required for them to be valid. Where assumptions are empirically untestable, they should be ar¬ 
gued for on the basis of scientific understanding, and ideally the alternative should be explored 
by employing partial identification bounds given both here and elsewhere. While some work 


has been done to develop sensitivity analyses for unmeasured confounding of the mediator (Tch- 


etgen Tchetgen[ |2011^ [Tchetgen Tchetgen and Shpitser[ |2012[ [Vansteelandt and VanderWeele 


20121, sensitivity analyses for ranges of plausible associations between cross-world counterfac- 


tuals remain undeveloped. Further development of sensitivity analyses of both forms would be 
highly beneficial for practical use, and is fertile ground for future work. We hope that the work 
presented here will inspire deeper consideration and transparency regarding underlying identify¬ 
ing assumptions in the practice of mediation analysis. 


2. PRELIMINARIES 


By way of introduction, the directed acyclic graph (DAG) displayed in Fig. [^(a) illustrates the 
simplest possible mediation setting, where A is defined to be the exposure taking either baseline 
value a* or comparison value a, M is defined to be the (potential) mediator, and Y is defined 
to be the outcome. This dag assumes randomization of the exposure, which for expositional 
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simplicity we maintain throughout. The graph also eneodes no unobserved eonfounding of the 
effeet of M on F given A. The effeet along the path A ^ Y on the diagram is generally referred 
to as direet with respeet to M, and the effeet along the path A —>■ M —)■ F on the diagram is 
generally referred to as indireet with respeet to M. 

Further elaboration of the speeifie type of direet and indireet effeet under eonsideration 
neeessitates eounterfaetual definitions. Let F(a) denote a subjeet’s outeome if treatment A were 
set, possibly eontrary to faet, to a. In the eontext of mediation, there will also be potential 
outeomes for the intermediate variable. Counterfaetuals M (a) and F (m, a) are defined similarly. 
In order to link these with the observed data, we adopt the standard set of eonsisteney assumptions 
that 


A A = a, then M(a) = M with probability one, 

A A = a and M = m, then F(m, a) = Y with probability one, and 
A A = a, then Y (a) = Y with probability one. 


In terms of eounterfaetuals, the randomization assumption eneoded by the DAG in Fig. [^(a) is 
{F(a, m), M (a)}_LLy4 for all a and m; the assumption of no unobserved eonfounding of M given 
A is F(a, m)_LLM (a) \ A = a for all a and m. Finally, we will eonsider as well defined the nested 
eounterfaetual F{a,M(a*)}, i.e., the eounterfaetual outeome under an intervention whieh sets 
the exposure to the eomparison value a, and the mediator to the value it would have taken under 
the eonflieting baseline exposure value a*. 


We may now define the pure/natural direet effeet and natural indireet effeet (Robins and 


Greenland 1992 Pearl 20011, whieh form the following deeomposition of the average eausal 


effeet: 


E{Y{a)}-E{Y(a-)} 

total effect 

= E [F{a, M(a)}] - E [F{F, M{a*)}] 

natural indirect effect pure direct effect 

= E [F{a, M{a)}] - E [F{a, M{a*)}] + E [F{a, M{a*)}] - E [F{a*, M{a*)}). 
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The terms E{Y(a)} = E [Y {a, M{a)}], for all a, are identified under randomization of A. The 
parameter 70 = E[Y{a, M{a *)}] would be identified if one were to interpret the DAG in Fig. [^(a) 
as a nonparametric struetural equation model with independent errors (NPSEM-ie). Struetural 
equations provide a nonparametrie algebraie interpretation of this DAG eorresponding to three 
equations, one for eaeh variable in the graph. Each random variable on the graph is associated 
with a distinct, arbitrary function, denoted g, and a distinct random disturbance, denoted e, each 
with a subscript corresponding to its respective random variable. Each variable is generated by its 
corresponding function, which depends only on all variables that affect it directly (i.e., its parents 
on the graph), and its corresponding random disturbance, as follows: 


A = 

M = £m) 


Y = gY{A,M,eY). 


Under particular interventions, these structural equations naturally encode dependencies of coun- 
terfactuals. Consider, for example, two interventions, one setting A = a*, and another setting 
A = a and M = m. The structural equations then become 


A = a* 

M{a*) = gM{a*,eM) 

Y{a*) = gY{a*,M{a*),eY) 


A = a 
M{a) = m 


Y{a,m) = gY{a,m,eY)- 


This formulation places no a priori restriction on the distribution of counterfactuals. The 
key assumption of the NPSEM-IE is that the random disturbances are mutually independent. This 
allows us to make independence statements regarding counterfactuals under various, possibly- 
conflicting interventions. In particular, this model implies that for all m, (i) {M{a), Y (a, m)}ALA, 
(ii) Y (a, m)_LLM(a) \ A = a, and (iii) Y (a, m)_LLM(a*) \ A = a, which in turn suffice for iden¬ 
tification of 7 o ( Pearl[ 2001[ ). Independence statements such as (iii) are known as cross-world 
counterfactual statements if a is not equal to a*, due to their comparison of interventions that 
could never occur in the same world simultaneously. Independence condition (iii) can be seen 
to hold under the model by considering the NPSEM-IE under a specific intervention and noting 
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that the only source of randomness in Y{a,m) = gY{a,m,eY) is Ey and the only source of 
randomness in M{a*) = gMifYiSM) is Sm- Thus, the cross-world-counterfactual-independence 
statement follows directly from independence of exogenous disturbances. However, such an in¬ 


dependence is neither experimentally verifiable nor enforceable (Robins and Richardson 20101. 


This issue has been discussed extensively (Robins and Richardson[ 2010[ Richardson and 


Robins |, [2013 ), and in large part motivated the development of the single-world intervention 


graphs (swigs) of [Richardson and Robins | ( |201 3 1 ). These causal graphs manage to elucidate this 
issue by graphically representing the counterfactuals themselves, allowing independence state¬ 
ments of counterfactuals to be read directly from the graph. Consider the SWIG in Fig. [^(b). By 
d-separation, it is clear that (i) Y (a, m)_LLM(a) for all a and m, however no such statement can 
be made from the graph about Y (a, m) and M{a*) when a ^ a*. Under this SWIG, independence 


between Y (a, m) and M{a*) is not assumed, and hence 70 is not point identified. Robins and 


Richardson (2010|) provide the following bounds for its partial identification in the setting where 


M is binary and SWIG independence assumptions M{a)lLA and Y (a, m)_LL{M(a), A} hold for 
all a and m: 


max{0, pr(M = 0 | A = a*) -f E{Y | M = 0, A = a) — 1} 

-|- max{0, pr(M = 1 | A = a*) -f E(Y | M = 1, ^4 = a) — 1} 

< 7o < 

mm{pr(M = 0 | A = a*), E{Y \ M = 0, A = a)} 

+ min{pr(M = 1 | A = a*), E{Y \ M = 1, A = a)}. 

In Section 2, we extend this result to the setting of a polytomous M. 

As previously mentioned, another often-overlooked condition required for identification of 
7 o is that there is no confounder of the mediator’s effect on the outcome that is affected by 
the exposure. Such a confounder is present in the setting illustrated in the DAG in Fig. |^(a). 
Generally, even under an NPSEM-IE interpretation of this DAG, 70 will not be identified in this 
setting. This is readily seen by considering the following representation under this model given 
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Figure 2: (a) A mediation directed acyclic graph in which R is an exposure-induced confounder. 
The nodes represent random variables, and the arrows represent possible causal effects of one 
random variable on another, (b) The single-world intervention graph in the setting of (a) that 
has been intervened on to set A to a G {a, a*} and M to m. The black nodes represent random 
variables under this intervention, the red nodes represent the level an intervened random variable 
takes under this intervention, and the arrows represent possible causal effects of one variable 
under this intervention on another. 


by Robins and Richardson (2010): 


{E{Y \ M, R = r, A = a) \ R = r*, A = a*}pT {R{a) = r, R{a*) = r*} . (1) 

r,r* 


Clearly the joint probability term can never be identified from observed data, since we will never 
be able to observe R{a) and R{a*) for the same individual. 

A few conditions for identification have been proposed. [Robins and Richardson (20101 give 
two. The first is that R{a)ALR{a*), in which case the troublesome term in (1) will factor, giving 


= J2e {EiY \ M,R = r,A = a)\R = r*,A = a*} pr(i? = r* | A = a*) 
X pr(i? = r I A = a). 


It seems biologically unlikely, however, that in a scenario in which A affects R, the counterfactual 
R under A = a would not be predictive of the counterfactual R under A = a*. The other 
condition is that the counterfactual outcome under one exposure value is a deterministic function 


8 




















of the counterfactual for the other treatment, i.e., R{a) = g{R{a*)}. In this ease, 


{E{Y \ M,R = r,A = a) \ R = r*,A = a*} pr(i? = r* | A = a*)I{r = g{r*)}. 


The above assumption is implied by rank preservation (Robins and Riehardson 2010| ), whieh is 
unlikely to hold in soeial and health seienees as it rules out individual-level effeet heterogeneity 


(Tehetgen Tehetgen and VanderWeele 2014). As none of these eonditions are experimentally 
verifiable, the authors themselves “do not advoeate blithely adopting sueh assumptions in order 


to preserve identifieation of the PDE in [this setting]" (Robins and Riehardson 2010). 


Tehetgen Tehetgen and VanderWeele (2014) give two testable eonditions for identifieation 


of 7 o when R is present. The first is of A-R monotonieity, i.e., for Bernoulli R, R{a) > R{a*). If 
Ris a veetor of Bernoulli random variables whose struetural equations have independent errors, 
and if monotonieity holds for eaeh element. 


7o = 


J2e{E{Y \ M,R = r,A = a) \ R = r*, A = a*}Ylfjirj,r*, a,a* 


rr^ 


j=l 


where 




pr(i?j = 1 I A = a*) if = rj = 1, 

pr(i?j = 1 I A = a) — pr(i?j = 1 | A = a*) if r* = 0 and = 1, 

0 if r* = 1 and rj = 0, 

pr(i?j = 0 I A = a) if r* = Tj = 0. 


Their seeond eondition is no M-R additive mean interaetion, i.e.. 


E{Y I m, r, a) — E(Y \ m*,r,a) — EiY | m, r*, a) -f E{Y \ m*, r*,a) = 0, 
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for all levels m and m* of M and r and r* of R. For diserete M and R, this yields 


7o = {E{Y I m,r*,a) — E(Y \ m*,r*, a)} pr(M = m | ^4 = a*) 

m 

+ ^ {E{Y I m*, r, a) — E{Y \ m*,r*, a)} pr(i? = r \ A = a) 

r 

+ E{Y I m*, r*, a). 


Esehewing the eross-world-eounterfaetual assumptions of the NPSEM-IE , Tehetgen Teh- 
etgen and Phiri| (2014) extend the bounds of Robins and Riehardson (20101 to allow for the 
presenee of an exposure-indueed eonfounder when the mediator is binary: 


max I^O, pr(M = 0 | A = a*) + 
+ max < 0, pr(M = 1 | ^4 = a*) + 


E(Y \ M = 0, R = r, A = a)pr(i? = r|y4 = a) — l| 
E{Y \ M = 1, R = r, A = a)pr(i? = r|y4 = a) — l| 


< 7o < 


min < pr(M = 0 | A = a*), 


),^E{Y I M = {),R = r,A = a)w{R = r M = a)| 

+ min ■|pr(M = 1 \ A = a*), E{Y \ M = l,R = r,A = a)pr(i? = r | ^4 = a) . 

We extend these bounds as well to allow for polytomous M in Seetion 3. Additionally, we eon- 
struet bounds for 70 under an NPSEM-IE that aeeount for a diserete exposure-indueed eonfounder, 
but require no further assumption. 

3. NEW PARTIAL IDENTIFICATION RESULTS 


We begin by extending the bounds of Robins and Riehardson (2010| and Tehetgen Tehetgen 
and Phiri ( |2014| ) to settings with diserete mediator and outeome. Proofs ean be found in the 
Appendix. 


Theorem 1. Under the SWIG in either Fig. ^(b) or Fig. ^(b) with discrete M and Y and 
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arbitrary R, 


(max [0, pr{M(a*) = m} + pr{y (a, m) = y} — l]I{y > 0) 

m,y 

+ min [pr{M(a*) = m},pr{F(a, m) = y}] I{y < 0)) 

< 7o < 

(max [0, pr{M(a*) = m} + pr{y (a, m) = y} — l]I{y < 0) 

m,y 

+ min [pr{M(a*) = m}, pr{y'(a, m) = y}] I{y > 0)). 

The upper and lower bounds eoineide when Y{a,m) or M{a*) is degenerate, whieh fol¬ 
lows from the properties of joint probability mass funetions. The upper bound is aehieved only if 
Y{a,m) and M(a*) are eomonotone for eaeh m, i.e., if m) = min [FY{a,m)iy), FM{a*)im)] 

for eaeh m, where Fx{-) denotes the joint (or marginal) eumulative distribution funetion of the 
random veetor (or sealar) X. The lower bound is aehieved only if they are eountermonotone 
for eaeh m, i.e., if FY(a,m),M{a*){y^^) = max {O, FY{a,m){y) + FM(a*}{m) - l} for eaeh m. A 
straightforward applieation of the gf-formula under the DAGs in Fig. and [^yields the following 
eorollaries: 

Corollary 1. For polytomous M and Y, 70 is partially identified under the SWIG in Fig. Wb) 
by the bounds in Theorem 1 with pr{M(a*) = m} = pr(M = m \ a*) and pr{y(a,m) = 
y} = pr(y = y \ m, a). It is partially identified under the SWIG in Fig. ^(b) by the same 
bounds, but with pr{M(a*) = m} = pr(M = m \ a*) and pr{F(a, m) = y} = ~ d I 

m, r, a)pr(i? = r | a). 

The seeond part of the eorollary eontinues to hold even if there were a hidden eommon eause 
of R and Y as in Fig. sinee the same gf-formula applies in this setting. Whereas the previous 
results invoked no eross-world-eounterfaetual independenees under the SWIG interpretation of 
the DAG in Fig. |^(a), sharper bounds are available under Pearl’s NPSEM-IE interpretation of 
these DAGS, as derived in the following result. 


11 



Figure 3: (a) A mediation direeted aeyelie graph in whieh an unobserved variable H affeets R, an 
exposure-indueed eonfounder, and Y. The blaek nodes represent observed random variables, and 
the arrows represent possible eausal effeets of one random variable on another, (b) The single¬ 
world intervention graph in the setting of (a) that has been intervened on to set A to d G {a, a*} 
and M to fh. The blaek nodes represent random variables under this intervention, the red nodes 
represent the level an intervened random variable takes under this intervention, and the arrows 
represent possible eausal effeets of one variable under this intervention on another. In each panel, 
the gray node represents a hidden random variable 


Theorem 2. For discrete R taking values m {1,... ,p}, let B be the x (p — 1)^ matrix 


kp—l 

0(p-l)x(p-l) 

0(p_l)x(p-l) 

0(p-l)x(p-l) 

-1'^ 

J-p-1 

0^-1 


Oj-1 

0(p-l)x(p-l) 

^p— 1 

0(p_l)x(p-l) 

0(p-l)x(p-l) 

o 

... 

1 

... 


Oj-1 

0(p-l)x(p-l) 

0(p-l)x(p-l) 

^p— 1 

0(p-l)x(p-l) 

0?-i 

0?-i 

-iT 

J-p-1 

ojli 

0(p-l)x(p-l) 

0(p-l)x(p-l) 

0(p_l)x(p-l) 

kp—1 

0^-1 

0^-1 


J-p-1 

IP—1 

IP—1 

^p—1 

IP—1 


1 "^ 
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d be the -dimensional vector 


Op-1 

pr (i? = 1 I A = a) 

Op-1 

pi {R = 2 \ A = a) 

Op-1 

pi {R = p — 1 \ A = a) 
pr {R = 1 \ A = a*) 
pi (R = 2 \ A = a*) 

pr {R = p — 1 \ A = a*) 
pi {R = p \ A = a) + pi {R = p \ A = a*) — 1 

and X be the vectorization of the matrix [E {E{Y \ M,R = r,A = a) \ R = r*,A = 

Under a NPSEM-IE corresponding to the DAG in Fig. ^(a) where M and Y can be either contin¬ 
uous or discrete, 70 is partially identified by + d),x^{B6u + c?)], where Sl and 5jj are 

the minimizing and maximizing solutions respectively to the linear programming problem with 
objective function x^B6 subject to the constraints 

min{pr(i? = 1 | ^4 = a),pr(i? = 1 | A = a*)} 

min{pr(i? = 1 | ^4 = a),pr(i? = 2 \ A = a*)} 

min{pr(i? = p \ A = a),pr(i? = p — 1 | A = a*)} 

min{pr(i? = p | ^4 = a),pr(i? = p \ A = a*)} 

min{0,1 — pr(i? = 1 \ A = a) — pr{R = 1 \ A = a*)} 

min{0,1 — pr(i? = 1 | ^4 = a) — pr(i? = 2 \ A = a*)} 

min{0,1 — pt{R = p \ A = a) — pr(i? = p — 1 \ A = a*)} 

min{ 0 ,1 — pr(i? = p | ^4 = a) — pr(i? = p | ^4 = a*)} 


En — 


(p-l)2 




(p-l)2 


6 < 
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and 5 > 0. 


Similar to the previous result, these bounds coineide if either R{a) or R{a*) is degenerate. 
The upper bound is achieved when R{a) and R{a*) are comonotone; the lower bound is achieved 
when they are countermonotone. While these bounds are not available in closed form, they can be 
readily solved using standard software, such as with the lp_solve function, which uses the revised 
simplex method and is accessible from a number of languages, including R, MATLAB, Python, 
and C. While the method used by this software is not guaranteed to converge at a polynomial 
rate ( |Klee and Minty[[1970 ), it is quite efficient in most cases ( Schrijver[ 19981. The following 
corollary shows that these bounds reduce to a closed form when R is binary. 


Corollary 2. Under a NPSEM-IE corresponding to the DAG in Fig. ^(a) with binary R, 


min 'S^E {E{Y | M, i? = r, A = a) | i? = r*, A = a*} h{r, r*, tth) 

TTiiGn ' ^ 


< 7o < 


max ^E {E{Y \ M, R = r, A = a) \ R = r*,A = a*} h{r, r*, tth) 

TTiiGn ^ ^ 


where If is the set 


and 


h{r, r*,7rii) 




< 


V 


{ max {0, pr(i? = 1 | A = a) + pr(i? = 1 | A = a*) — 1} , 


min {pr(i? = 1 A = a), pr(i? = 1 

1 A = a*)}} 



VTll 

ifr 

pr(i? = 1 

A = a) — TTii 

ifr 

pr(i? = 1 

A = a*) — TTii 

ifr 

1 — pr(i? = 1 A = a) — pr(i? = 1 

A = a*) + TTii 

ifr 


= r = 1, 
= 0 and r 
= 1 and r 
= r = 0. 


1 , 

0 , 


Under A — R monotonicity with binary R, the identifying functional given by Tchetgen Tch- 


etgen and VanderWeele[(2014) is recovered at the upper bound in Corollary 2. All results given 
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here ean be extended to settings with observed pre-exposure eonfounders, whieh we denote C. 
In Corollary 1, one must first perform eonditional inferenee given C, then subsequently average 
over the eonditional bounds. This is in faet valid due to Jensen’s inequality, beeause the eon- 
straints on the marginal joint probabilities are already implied by the eonstraints enforeed on the 
eonditional joint distributions, so no further eonstraints need be eonsidered. However, Jensen’s 
inequality does not apply in the ease of Theorem 2, so eontrolling for C requires estimating two 
pairs of eandidate bounds and seleeting the larger of the lower bounds and the smaller of the 
upper bounds. When p is of moderate size, 5 ean be solved for eaeh eovariate pattern of C, i.e., 
without modeling the dependenee of the eross-world-eounterfaetual joint distribution on C. Av¬ 
eraging the resulting eonditional bounds gives the first pair of bounds. The seeond pair results 
from replaeing eaeh probability in the theorem with an average over the probabilities eonditional 
on C and doing the same with x. 


4. APPLICATION TO HARVARD PEPFAR DATA SET 


We now eonsider an applieation to a data set eolleeted by the Harvard President’s Emergeney Plan 
for AIDS Relief (PEPFAR) program in Nigeria. The data set eonsists of previously antiretroviral 
therapy (ART)-naive, HIV-1 infeeted adult patients who began ART in the program and were fol¬ 
lowed at least one year following initiation. Patients without reliable viral load data at two of the 
hospitals were exeluded. Only eomplete eases initially preseribed to either TDF-i-STC/FTC-i-NVP 
or AZT-i-3TC-i-NVFQwere eonsidered for this analysis. Thus, the data set we eonsider eonsists of 
6627 patients, 1919 of whom were preseribed to TDF-I-3TC/FTC-I-NVP, and the remaining 4708 
preseribed to AZT-I-3TC-I-NVP. 

There has aeeumulated evidenee of a differential effeet on virologie failure between these 
two first-line antiretroviral treatment regimens (Tang et al. 2012| ). Virologie failure is defined 
by the World Health Organization as repeat viral load above 1000 eopies/mE. We base this on 
measurements at 12 and 18 months of ART duration in our analysis. 

A natural question of seientifie interest is what role adherenee plays in mediating this differ¬ 
ential effeet. We are primarily interested in learning about the seientifie meehanism of this effeet 
on the individual level. The natural indireet effeet best eaptures this meehanism in that it eaptures 


'3TC=lamivudine, AZT=zidovudine, FTC=emtricitabine, NVP=nevirapine, TDF=tenofovir 
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an isolated effect difference mediated by adherence by, in a sense, deactivating effect differences 
along all other possible causal pathways. We specifically examine the effect through adherence 
over the second six months since treatment assignment, i.e., the six months prior to the first vi¬ 
ral load measurement. Identification is complicated by the presence of treatment toxicity, which 
clearly affects adherence directly, and has the potential to modify the effect of the treatment as¬ 
signment on virologic failure. Thus, toxicity measured at six months after treatment assignment 
is an exposure-induced confounder of the effect of the mediator on the outcome. Further, toxicity 
and virologic failure are likely to be rendered dependent by unobserved underlying biological 
common causes as in Fig. where H represents these hidden biological mechanisms. Because 
we define the mediator to be adherence over the second six months, adherence over the first six 
months is also an exposure-induced confounder along with toxicity, and must be accounted for. 
Had we defined the mediator to be adherence over the full year, measurement of the mediator and 
toxicity would have overlapped, violating the principle of temporal ordering. 

Let C denote the vector consisting of baseline covariates sex, age, marital status, WHO 
stage, hepatitis C virus, hepatitis B virus, CD4-I- cell count, and viral load. Let A be an indicator 
of ART assignment taking levels a* for TDF-I-3TC/FTC-I-NVP and a for AZT-I-3TC-I-NVP; R be 
a vector of two indicator variables, one of the presence of any lab toxicity, and one of adherence 
exceeding 95%, both over the first six months following initiation of therapy; M be an indicator 
of adherence exceeding 95% over the subsequent six months; and Y be an indicator of virologic 
failure at one year, i.e., repeat viral load above 1000 copies/mL at one year and at 18 months. 

Here we estimate the natural indirect effect of A on F through M, as defined above, on the 
risk difference scale using the various sets of identifying assumptions given above. Throughout, 
inference is performed using maximum likelihood for point estimation and a weighted bootstrap 
( Rao and Zhao[ 1992[ van der Vaart and Wellner 19961 for confidence intervals, which appro¬ 
priately accounts for the rare outcome. The results are summarized in Fig. It is immediately 
apparent that inference is sensitive to which identifying assumptions are made. Consider an 
investigator who might be willing to rely on cross-world-counterfactual independences. If she 
decides to ignore the presence of toxicity, she might likely conclude that there is a very small, 
yet significant negative indirect effect. Conversely, were she to make the no M-R interaction 
assumption, she would find a significant positive indirect effect with considerable uncertainty. 
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Ignore, IE Ignore, None Monoton., IE No M’R, IE None, None None, IE 

Assumptions regarding R, Assumptions regarding cross-world counterfactuals 


Figure 4: A plot showing the estimated natural indirect effect of ART assignment on virologic 
failure with respect to adherence under the various assumptions. The assumptions vary across the 
horizontal axis, with the first part of the label indicating the assumption regarding the exposure- 
induced confounder, R, and the second part indicating the assumption regarding cross-world 
counterfactuals. For the assumptions regarding R, “Ignore" means that the presence of R is ig¬ 
nored altogether, “Monoton." means the A-R monotonicity assumption in Section 1, “No M*R" 
means the no M-R interaction assumption in Section 1, and “None" means that R was accounted 
for without additional assumptions. For the assumptions regarding cross-world counterfactuals, 
“IE" means a npsem-ie was assumed, and “None" means no cross-world-counterfactuals inde¬ 
pendences were assumed. When the assumptions give partial identification, the two dots rep¬ 
resent the point estimates of the upper and lower bound for the natural indirect effect, and the 
vertical bar represents the bootstrap 95% confidence interval for the interval. When the assump¬ 
tions give full identification, the single dot represents the point estimate of the natural indirect 
effect, and the vertical bar represents its bootstrap 95% confidence interval. 
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In fact, an empirical test of this assumption reveals that it is unlikely to apply. Likewise, the 
data suggest that the required assumption of independent errors of the eomponents of R is also 
unlikely to hold. Nonetheless, we present both results for the sake of eomparison. Results are 
fairly impreeise under monotonieity, and do not show a signifieant effeet. 

Another investigator unwilling to impose eross-world-eounterfaetual-independenee assump¬ 
tions is left with little to say as the bounds are wide, and inelude the null hypothesis of no NIE, 
regardless of how toxieity is handled. Interestingly, the bounds that result from making no as¬ 
sumptions about the joint distribution of the eross-world R eounterfaetuals are narrower than the 
bounds that result from ignoring R. That is, the bounds themselves appear narrower; the vari- 
anees of the interval estimates appear to be eomparable. This is beeause even though we do not 
impose any restrietions on the distribution of R or its eounterfaetuals a priori, observing R is 
elearly informative. The bounds aeeounting for R have the added advantage of being the only 
identifying formula that remains valid when toxieity and virologie suppression are affeeted by an 
unobserved eommon eause, as in Fig. 

Finally, ineorporating R results in narrower interval estimates than not imposing the npsem-ie, 
even if R were ignored. Thus, eross-world-eounterfaetual-independenees appear to have stronger 
empirieal implieations in the eurrent analysis than assumptions regarding exposure-indueed eon- 
founders. The general trend in these results is that little is gained in terms of preeision by assump¬ 
tions regarding R. In faet, the eonfidenee interval for the bounds resulting from the independent 
errors assumption and no assumption regarding R is narrower than the eonfidenee interval for the 
estimate that results from assuming monotonieity, despite the faet that the NiE is point-identified 
in the latter ease. The naive assumption that R is not a eonfounder is the only assumption about 
R under whieh preeision is gained. 


APPENDIX 


Proofs of theorems 

Proof of Theorem^ For eaeh level m and y, define 7ri(m, y) = pr{F (a, m) = y} and 7r2(m) = 
pr{M(a*) = m}. There exist Ui{m,y),U2{'rn) ~ W(0,1) sueh that I{Y{a,m) = y} = 
I{Ui{m,y) < Tii{m,y)} and= m} = I{U2{rn) < 7r2(m)}. The joint distribution 
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Fui(m,y),U2{m), then, is a bivariate eopula, for whieh Freehet-Hoeffding sharp bounds exist. Ap¬ 
plying these to pr {Y (a, m) = y,M{a*) = m} = Fui(^m,y),U 2 im) { 7 ri(m, y), 7 r 2 (m)}, we have 

max [0, pr{M(a*) = m} + pr{F(a, m) = y} — 1] 

< pr {Y (a, m) = y, M{a*) = m} < 
min [pr{M(a*) = m}, pr{F(a, m) = y}]. 

Applying these bounds to eaeh summand in 

E[Y{a, M{a*)}] = ^ ypT{Y{a, m) = y, M{a*) = m} 

m,y 

yields the result. □ 

Proof of Theorem^ Let = pr {-R(o) = R{(Y) = r*}, vr be the veetorization of the matrix 

['Kr,r*], and 5 be the veetorization of the matrix [Kr,r*]-p-p, i-C-, the veetorization of the matrix 
TT with row p and eolumn p removed. Equation (1) ean now be expressed as 70 = whieh 
is identified in x, but not vr. Conditional on the marginal probabilities, whieh are identified, the 
joint probabilities have (p — 1)^ degrees of freedom, and ean be expressed as vr = B6 + d. Sinee 
x^B5 is linear in 5 and eaeh element of 5 is eonstrained by 

max {0, pr(i? = r | A = a) + pr(i? = r* | A = a*) — 1 } 

min {pr(i? = r | A = a), pr(i? = r* | A = a*)} , 

the proposed linear programming problem will yield the 6 that optimizes x^BS, and henee 
x'^{B6+d). Thus, 70 will be bounded by x^{B6+d) evaluated at the minimizing and maximizing 
linear programming solutions 6l and 6u- □ 
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